Running head: Structural prediction and ambiguity resolution Why is that? Structural prediction and ambiguity resolution in a very large corpus of English sentences

نویسندگان

  • Douglas Roland
  • Jeffrey L. Elman
  • Victor S. Ferreira
چکیده

Previous psycholinguistic research has shown that a variety of contextual factors can influence the interpretation of syntactically ambiguous structures, but psycholinguistic experimentation inherently does not allow for the investigation of the role that these factors play in natural (uncontrolled) language use. We use regression modeling in conjunction with data from the British National Corpus to measure the amount and specificity of the information available for disambiguation in natural language use. We examine the Direct Object/Sentential Complement ambiguity and the closely related issue of complementizer use in sentential complements, and find that both ambiguity resolution and complementizer use can be predicted from contextual information. Cognition (in press) Why is that? 3 Why is that? Structural prediction and ambiguity resolution in a very large corpus of English sentences From a certain perspective, linguistic expressions include massive lexical, structural, and acoustic ambiguity. Even when the ambiguities are ultimately resolved by subsequent information (e.g., at the end of a sentence), the incremental nature of most language processing suggests that comprehenders must deal with even temporary ambiguities during the course of sentence comprehension. Yet we seem to understand linguistic expressions with comparative ease, scarcely even noticing any ambiguities at all. The ways in which comprehenders resolve ambiguity has been a major focus of much research in sentence processing. According to constraint-based accounts of language processing (e.g., Altmann, 1998; Altmann, 1999; MacDonald, Pearlmutter, & Seidenberg, 1994; MacWhinney & Bates, 1989; Spivey & Tanenhaus, 1998), ambiguity is resolved through the interaction of multiple sources of information contained in linguistic expressions. Alternatively, serial accounts of processing (e.g., Frazier, 1978) argue that the initial interpretations of utterances are based solely on syntactic information, and that these interpretations are later revised based on subsequent consideration of other information in the input. Much of the evidence about the availability and use of information comes from controlled psycholinguistic experiments that typically examine one or two factors at a time. The high degree of control used in experimental designs is essential for determining if and when some theoretically important potential source of information can influence ambiguity resolution. However, this sort of methodology leaves open a number of questions: Just how ambiguous are linguistic expressions? How much information is available for resolving ambiguity in typical naturally occurring contexts? How large a role do the experimentally studied factors play in natural (uncontrolled) language use and comprehension? Are there other factors that play a significant role in processing? What sorts of interactions occur when a larger number of factors are explored? Here, we adopt an approach that is complementary to an experimental one. Rather than addressing the question of if and when a particular source of information is used during comprehension, we use corpus data to examine the quantity and variety of information available in normal language use, and to investigate sources of information that would otherwise be missed in the carefully controlled environment of psycholinguistic experiments. Indeed, this approach has the potential to provide information that is relevant to both constraint-based models of sentence processing and syntax-first models of sentence processing. For a syntax-first approach, our results will indicate the degree to which syntax-only heuristics such as minimal attachment can correctly resolve ambiguity, and the extent to which the predictions of the first stage need to be revised by a more general second stage. Our results will also provide information about the information that is used to revise the initial syntax-based predictions. For a constraint-based approach, the analysis of the information available in naturally occurring data provides an indication of the relative importance of factors that are typically investigated in separate experiments, as well as the interaction among these factors. Cognition (in press) Why is that? 4 Knowing how much information is available during normal language comprehension also shows the extent to which the set of factors under consideration can account for language processing. If it turns out that a limited set of factors can be used to resolve nearly all naturally occurring ambiguity, it would suggest that a complex system employing a wide variety of factors may be unnecessarily complicated. On the other hand, if the naturally occurring contexts turn out to be information-poor, it could suggest that an alternate mechanism for ambiguity resolution might be needed. In order to investigate the amount and variety of information available during natural language comprehension, we identify a relatively large number of general factors that might affect language performance, and investigate those influences using correlational techniques in a very large corpus of naturally produced sentences. We performed three separate analyses on our corpus using regression models. First, in Study 1, we investigated the amount and type of information available for predicting the resolution of the commonly studied Direct Object / Sentential Complement (DO/SC) ambiguity. The DO/SC ambiguity occurs in sentences fragments such as in example (1), where the post-verbal noun phrase can either be the direct object of the verb, as in (2), or the subject of a sentential complement, as in (3). (1) The athlete realized her goals... (2) The athlete realized her goals through hard training. (3) The athlete realized her goals would be difficult to achieve. Our main finding of this first set of analyses is that there is abundant information available for resolving the DO/SC ambiguity before the point normally considered to be the disambiguation point the embedded verb. Given this, we investigate the level of specificity of this information in Study 2. In Study 3, we investigate whether our success in predicting the resolution of the DO/SC ambiguity was in part due to an effort on the part of the producers of the language contained in the corpus data to avoid ambiguity in the first place. Study 1: Predicting DO/SC-0 Subcategorization The goal of the first experiment was to provide a lower bound estimate for the amount of information available for disambiguating ambiguous DO/SC cases during natural sentence processing. This lower bound can alternatively be interpreted as an upper bound on the amount of potential ambiguity during normal comprehension. A secondary goal was to see which types of information are most useful, and which types of information are potentially misleading.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Why is that? Structural prediction and ambiguity resolution in a very large corpus of English sentences.

Previous psycholinguistic research has shown that a variety of contextual factors can influence the interpretation of syntactically ambiguous structures, but psycholinguistic experimentation inherently does not allow for the investigation of the role that these factors play in natural (uncontrolled) language use. We use regression modeling in conjunction with data from the British National Corp...

متن کامل

Semantic Priming Effect on Relative Clause Attachment Ambiguity Resolution in L2

This study examined whether processing ambiguous sentences containing relative clauses (RCs) following a complex determiner phrase (DP) by Persian-speaking learners of L2 English with different proficiency and working memory capacities (WMCs) is affected by semantic priming. The semantic relationship studied was one between the subject/verb of the main clause and one of the DPs in the complex D...

متن کامل

Relative clause attachment ambiguity resolution in Persian

The present study seeks to find the way Persian native speakers resolve relative clause attachment ambiguities in sentences containing a complex NP of the type NP of NP followed by a relative clause (RC). Previous off-line studies have found a preference for high attachment in the present study, an on-line technique was used to help identify the nature of this process. Persian speakers were pre...

متن کامل

Relative Clause Ambiguity Resolution in L1 and L2: Are Processing Strategies Transferred?

This study aims at investigating whether Persian native speakers highly advanced in English as a second language (L2ers) can switch to optimal processing strategies in the languages they know and whether working memory capacity (WMC) plays a role in this respect. To this end, using a self-paced reading task, we examined the processing strategies 62 Persian speaking proficient L2ers used to read...

متن کامل

Self-Regulation, Goal Orientation, Tolerance of Ambiguity and Autonomy as Predictors of Iranian EFL learners’ Second Language Achievement: A Structural Equation Modeling Approach

The identification of the cognitive, affective, social and even physiological factors affecting second or foreign language learning routes and rate has for long been a challenging aspiration for second language researchers. However, a recent preoccupation of the researchers in this area has been the study of the combinatorial impacts of such factors on second or foreign language learning proces...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004